Module identification in bipartite and directed networks 
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Modularity is one of the most prominent properties of real-world complex networks. Here, we address the 
issue of module identification in two important classes of networks: bipartite networks and directed unipartite 
networks. Nodes in bipartite networks are divided into two non-overlapping sets, and the links must have one 
end node from each set. Directed unipartite networks only have one type of nodes, but links have an origin and 
an end. We show that directed unipartite networks can be conviniently represented as bipartite networks for 
module identification purposes. We report a novel approach especially suited for module detection in bipartite 
networks, and define a set of random networks that enable us to validate the new approach. 
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Units in physical, chemical, biological, technological, and 
social systems interact with each other defining complex net- 
works that are neither fully regular nor fully random dHUH]. 
Among the most prominent and ubiquitous properties of these 
networks is their modular structure [2, 4], that is, the existence 
of distinct groups of nodes with an excess of connections to 
each other and fewer connections to other nodes in the net- 
work. 

The existence of modular structure is important in several 
regards. First, modules critically affect the dynamic behavior 
of the system. The modular structure of the air transportation 
system [5], for example, is likely to slow down the spread of 
viruses at an international scale [6] and thus somewhat min- 
imize the effects of high-connectivity nodes that may other- 
wise function as "super- spreaders" |71 la]- Second, differ- 
ent modules in a complex modular network can have different 
structural properties [9]. Therefore, characterizing the net- 
work using only global average properties may result in the 
misrepresentation of the structure of many, if not all, of the 
modules. Finally, the modular structure of networks is likely 
responsible for at least some of the correlations (e.g. degree- 
degree correlations fiol [H [H [H HI), that have attracted 
the interest of researchers in recent years J^]. 

For the above reasons, considerable attention has been 
given to the development of algorithms and theoretical frame- 
works to identify and quantify the modular structure of net- 
works (see [15] and references therein). However, current re- 
search activit y has p aid little attention, except for a few studies 
in sociology Ilq,ll7ll . to the problem of identifying modules in 
a special and important class of networks known as bipartite 
networks (or graphs). Nodes in bipartite networks are divided 
into two non-overlapping sets, and the links must have one 
end node from each set. Examples of systems that are more 
suitably represented as bipartite networks include: 

• Protein-protein interaction networks lfl2l fl8l Il9l l20ll 
obtained from yeast two hybrid screening: one set of 
nodes represents the bait proteins and the other set rep- 
resents the prey or library proteins. Two proteins, a bait 
and a library protein, are connected if the library protein 
binds to the bait. 



resents animal species and the other set represents plant 
species. Links indicate mutualistic relationships be- 
tween animals and plants (for example, a certain bird 
species feeding on a plant species and dispersing its 
seeds). 



• Scientific publication networks B23L 1241 125H : one set 
represents scientists and the other set represents pub- 
lications. A link between a scientist and a publication 
indicates that the scientist is one of the authors of the 
publication. 

• Artistic collaboration networks l25l l26l l27ll : one set 
represents artists and the other teams. A link indicates 
the participation of an artist in a team. 

Another important class of networks for which no sound 
module identification methods are available are unipartite di- 
rected networks. Examples of directed unipartite networks in- 
clude: 



• Food webs B281 129H : nodes represent species and links 
indicate trophic interactions in an ecosystem. 



Plant-animal mutualistic networks 121LI22I1 : one set rep 



• Gene regulatory networks Q30fl: nodes are genes and 
links indicate regulatory interactions. 

The usual approach to identify modules in directed net- 
works is to disregard the directionality of the connections, 
which will fail when different modules are defined based on 
incoming and outgoing links. 

Here, we address the issue of module identification in com- 
plex bipartite networks. We start by reviewing the approaches 
that are currently used heuristically and aprioristically to solve 
this problem. We then suggest a new approach especially 
suited for module detection in bipartite networks, and define 
a set of random networks that permit the evaluation of the ac- 
curacy of the different approaches. We then discuss how it 
is possible to use the same formalism to identify modules in 
directed unipartite networks. Our method enables one to in- 
dependently identify groups of nodes with similar outgoing 
connections and groups of nodes with similar incoming con- 
nections. 
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I. BACKGROUND 

For simplicity, from now on we denote the two sets of 
nodes in the bipartite network as the set of actors and the 
set of teams, respectively. Given a bipartite network, we are 
interested in identifying groups (modules) of actors that are 
closely connected to each other through co-participation in 
many teams. Of course, one is free to select which set of 
nodes in a given network is the "actor set" and which one is 
the "team set," so one can identify modules in either or both 
set of nodes. 

We require any module-identification algorithm to fulfill 
two quite general conditions: (i) the algorithm needs to be 
network independent; and (ii) given the list of links in the net- 
work, the algorithm must determine not only a good partition 
of the nodes into modules, but also the number of modules and 
their sizes. 

The first condition is somewhat trivial. We just make it ex- 
plicit to exclude algorithms that are designed to work with a 
particular network or family of networks, but that will other- 
wise fail with broad families of networks (for example, large 
networks or sparse/dense networks). 

The second condition is much more substantial, as it makes 
clear the difference between the module-identification prob- 
lem and the graph partitioning problem in computer science, 
in which both the number of groups and the sizes of the groups 
are fixed. To use a unipartite network analogy, given a set of 
120 people attending a wedding and information about who 
knows whom, the graph partitioning problem is analogous to 
optimally setting 12 tables with 10 people in each table. In 
contrast, the module-identification problem is analogous to 
identifying "natural" groups of people, for example the dif- 
ferent families or distinct groups of friends. 

The second condition also excludes algorithms (based, 
for example, on hierarchical clustering or principal com- 
ponent analysis [31]) that project network data into some 
low-dimensional space without specifying the location of the 
boundaries separating the groups. For example, given a den- 
dogram generated using hierarchical clustering, one still needs 
to decide where to "cut it" in order to obtain the relevant mod- 
ules. To be sure, one can propose a combination of algorithms 
that first project the data into some low-dimensional space and 
then set the boundaries, and assess the accuracy of the method. 
In general, however, one cannot evaluate the performance of 
hierarchical clustering, given that hierarchical clustering does 
not provide a single solution to module-identification prob- 
lem. Neither can one test the infinite combinations of dimen- 
sionality reduction algorithms with techniques for the actual 
selection of modules. 

Freeman [32] has recently compiled a collection of 21 al- 
gorithms that have been used in the social networks literature 
to identify modules in bipartite networks. To the best of our 
understanding none of the algorithms described there satisfies 
the two conditions above. Among the statistical physics com- 
munity, on the other hand, the common practice is to project 
the bipartite network onto a unipartite actors' network, and 
then identify modules in the projection. In the scientists' pro- 
jection of a scientific publication network, for example, two 



scientists are connected if they have coauthored one or more 
papers. The caveat of this approach is that, even if the pro- 
jection is weighted (by for example, the number of papers 
coauthored by a pair of scientists), some information of the 
original bipartite network, like the sizes of the teams, is lost 
in the projection. Here, we suggest an alternative to existing 
approaches to identify modules in complex bipartite networks. 



II. MODULARITY FOR BIPARTITE NETWORKS 

A widely used and quite successful method for the identifi- 
cation of modules in unipartite networks is the maximization 
of a modularity function. Although this method has limita- 
tions lf33l[34l[35ll . it yields the most accurate results reported 
in the literature for a wide family of random networks with 
prescribed modular structure lfl5l[36l[3~7ll . 

In the same spirit, here we define a modularity function that, 
upon optimization, yields a partition of the actors in a bipartite 
network into modules. By doing this, the module identifica- 
tion problem becomes a combinatorial optimization problem 
that is analogous to the identification of the ground state of a 
disordered magnetic system |38, 39]. 

A ubiquitous modularity function for unipartite networks 
is the Newman-Girvan modularity [40]. The rationale behind 
this modularity is that, in a modular network, links are not 
homogeneously distributed. Thus, a partition with high mod- 
ularity is such that the density of links inside modules is sig- 
nificantly higher than the random expectation for such density. 
Specifically, the modularity Ai^) of a partition V of a net- 
work into modules is 



N M 

M(V) = J2 



da 

2L 



(D 



where Nm is the number of modules, L is the number of links 
in the network, l s is the number of links between nodes in 
module s, and d s is the sum of the degrees of the nodes in 
module s. Then l s /L is the fraction of links inside module s, 
and (d s /2L) 2 is an approximation (assuming that self-links 
and multiple links between nodes are allowed) to the frac- 
tion of links one would expect to have inside the module from 
chance alone. 

We define a new modularity Mb^P) that can be applied to 
identify modules in bipartite networks. We start by consid- 
ering the expected number of times that actor i belongs to a 
team comprised of m a actors: 



in 
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where ti is the total number of teams to which actor i belongs. 
Similarly, the expected number of times that two actors i and 
j belong to team a is 



""M^n-a — 1) — - 



(3) 



Therefore, the average number of teams in which i and j are 
expected to be together is 



(E„ m «) 2 



(4) 



where we have used the identity ^ a m a = J^k Note that 
E a m a{ m a — 1) and (J^ a m a ) 2 are global network proper- 
ties, which do not depend on the pair of actors considered. 

Equation enables us to define the bipartite modularity as 
the cumulative deviation from the random expectation 



M B (V) = J2 



E 



(5) 



where is the actual number of teams in which i and j are 
together. For convenience, we exclude the irrelevant diagonal 
term i = j from the sums [48], and normalize the modularity 
so that A4b — * 1 when: (i) all actors in each team belong to a 
single module <X S Ei^yes c y = E a m a(m a - 1)), and (ii) 
the random expectation for pairs of nodes being in the same 

team is small (£ g E i7 y es < (E TOa ) 2 )' 

As in the derivation of Eq. (HJ, the null model implicit in 
Eqs. (0 and (0 is such that one could, in principle, have mul- 
tiple connections between an actor and a team. In most cases 
this situation would not make sense, so the null model is only 
appropriate when m a and tj are much smaller than J2 a m a> 
for all a and all i. 



III. MODEL BIPARTITE NETWORKS WITH MODULAR 
STRUCTURE 

Ensembles of random networks with prescribed modular 
structure |01 enable one to assess algorithm's performance 
quantitatively, and thus to compare the performance of dif- 
ferent algorithms. Here, we introduce an ensemble of random 
bipartite networks with prescribed modular structure (Fig. [TJ. 

We start by dividing the actors into Nm of modules; each 
module s comprises S s nodes. For clarity, we use different 
"colors" for different modules. The network is then created 
assuming that actors that belong to the same module have a 
higher probability of being together in a team than actors that 
belong to different modules 114911 . Specifically, we proceed by 
creating Nt teams as follows: 

• Create team a. 

• Select the number m a of actors in the team. 

• Select the color c a of the team, that is, the module that 
will contribute, in principle, the most actors to the team. 

• For each spot in the team: (i) with probability p, se- 
lect the actor from the pool of actors that have the same 
color as the team; (ii) otherwise, select an actor at ran- 
dom with equal probability. The parameter p, which 
we call team homogeneity, thus quantifies how homo- 
geneous a team is. In the limiting cases, for p = 1 all 




FIG. 1: Model random bipartite networks with modular structure, 
(a) Nodes are divided into two sets, actors (circles) and teams (rect- 
angles). Each color represents a different module in the actors' set, 
and teams of a given color are more likely to contain actors of their 
color (see text), (b) Two sample networks with Nm = 4 modules, 
with 16 actors (circles) each, and Nt = 64 teams (diamonds), with 
m — 7 actors each. The network on the left has a strong modular 
structure, p — 0.9, while the modular structure is less well defined 
on the right, p — 0.5 (see text for the definition of p). 



the actors in the team belong to the same module and 
modules are perfectly segregated, whereas for p = 
the color of the teams is irrelevant, actors are perfectly 
mixed and the network does not have a modular struc- 
ture. 



IV. RESULTS 

We next investigate the performance of different module 
identification algorithms in both model networks with pre- 
defined modular structure, and in a simple real network that 
shows some interesting features. 

We consider three approaches for the identification of mod- 
ules in bipartite networks. First, we consider the unweighted 
projection (UWP) approach. Within this approach, we start by 
building the projection of the bipartite network into the actors 
space. Then we consider the projection as a regular unipartite 
network and use the modularity given in Eq. ([TJ. 

Next, we consider the weighted projection (WP) approach. 
Within this approach, we start by building the weighted pro- 
jection of the bipartite network. In the weighted projection, 
actors are connected if they are together in one or more teams, 
and the weight it)y of the link indicates the number of teams 



in which the two actors are together (thus, 



,•). We 



then use the simplest generalization to weighted networks of 
the modularity in Eq. ([TJ 



M W {V) = 
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where W = J2i>j w ij' w s nt ' s the sum of the weights of the 
links within module s, and wf l = Y^,ies Ylj w ij- 

Finally, we consider the bipartite (B) approach. Within this 
approach, we consider the whole bipartite network and use the 
modularity introduced in Eq. ((5J- 

In all cases, we maximize the modularity using simulated 
annealing [41]. Several alternatives have been suggested to 
maximize the modularity including greedy search 14211 . ex- 
tremal optimization [43], and spectral methods ll44l l45ll . In 
general, there is a trade-off between accuracy and execu- 
tion time, with simulated annealing being the most accurate 
method [15], but at present too slow to deal properly with net- 
works comprising hundreds of thousands or millions of nodes. 



A. Model bipartite networks 

We consider the performance of the different module iden- 
tification approaches when applied to the model bipartite net- 
works described above. We assess the performance of an algo- 
rithm by comparing the partitions it returns to the predefined 
group structure. Specifically, we use the mutual information 
Iab HH between partitions A and B to quantify the perfor- 
mance of the algorithms 



log 



Iai 



Eft* 



(7) 



Here, S is the total number of nodes in the network, is the 
number of modules in partition A, nf is the number of nodes 
in module i of partition A, and nfj 8 is the number of nodes 
that are in module i of partition A and in module j of partition 
B. The mutual information between partitions A and B is 1 if 
both partitions are identical, and if they are uncorrected. 

In the simplest version of the model all modules have the 
same number of nodes, all teams have the same size, and the 
color of each team is set assuming equal probability for each 
color. Unless otherwise stated, we build networks with Nm = 
4 modules, each of them comprising 32 actors, and Nt = 128 
teams of size m — 14. 



1. Team homogeneity 

We first investigate how team homogeneity p affects algo- 
rithm performance. For p = 1, all the actors in a team be- 
long to the same module, and any reasonable algorithm must 
perfectly identify the modular structure of the network; thus 
1=1. Conversely, forp = 0, actors are perfectly mixed in 
teams, and all algorithms will return random partitions due to 
small fluctuations 13^1 : thus 1 = 0. Any p > will provide a 
signal that an algorithm can, in principle, extract. 

As shown in Fig.|2|a), the UWP approach performs system- 
atically and significantly worse than the weighted projection 
and the bipartite algorithms for all values of p. For the choice 
of parameters described above, the last two algorithms start 
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FIG. 2: Algorithm performance as a function of: (a) team homogene- 
ity p (simulation parameters: Nm = 4, S s = 32 for all modules); 
(b) number of teams Nt (simulation parameters: Nm — 4, S s = 32 
for all modules); (c) module size homogeneity h (simulation param- 
eters Nm = 6, 132 nodes); and (d) mean team size p, (simulation 
parameters: A?m = 4, S s = 32 for all modules). Error bars indicate 
the standard error. 



to be able to identify the modular structure of the network for 
p « 0.35. Forp > 0.5, one already finds / > 0.9. The WP 
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and the B approaches yield indistinguishable results. 

2. Number of teams and average team size 

Team homogeneity is not the only parameter affecting algo- 
rithm performance. For example, the number of teams Nt in 
the network critically affects the amount of information avail- 
able to an algorithm. Interestingly, the number of teams af- 
fects in different ways the UWP approach on the one hand, 
and the WP and B approaches on the other; Fig.|2jb). For the 
WP and B algorithms, the larger Nt, the larger the amount 
of information and therefore the easier the problem becomes. 
Indeed, even for very small values of p, the signal to noise 
ratio can become significantly greater than 1 if Nt is large 
enough. On the contrary, as the number of teams increases 
the UWP becomes denser and denser and eventually becomes 
a fully connected graph, from which the algorithm cannot ex- 
tract any useful information. Once more, the performance of 
the WP and the B approaches are indistinguishable. 

3. Module size heterogeneity 

In real networks, modules will have (sometimes dramati- 
cally) different sizes [46]. Given the sizes of the modules 
in a network, and assuming that they are ordered so that 
Si > S*2 > • • • > Sjv m , we define h as the ratio of sizes 
between consecutive modules (with integer rounding) 

fc=%i. (8) 

Additionally, we select the color of the teams with probabili- 
ties proportional to the size of the corresponding module, so 
that all actors participate, on average, in the same number of 
teams. 

As we show in Fig. [2c), we again observe that the WP and 
the B approach perform similarly, and clearly outperform the 
UWP approach for all values of h. 

4. Team size distribution 

All the results so far suggest that the WP approach and the 
B approach yield results that are indistinguishable from each 
other. We know, however, that differences do exist between 
both. The distribution of team sizes, in particular, is taken 
into account in the B approach but disregarded in the WP ap- 
proach, and "teams" with m — 1 are totally disregarded in 
projection-based approaches, but not in the B approach. 

We thus investigate what is the effect of the team size distri- 
bution on the performance of the algorithms. Instead of con- 
sidering that all teams have the same size m, we now consider 
a distribution p(m) of team sizes. In particular, we consider a 
(displaced) geometric distribution 

1 / l\ m_1 
p(m) = - 1 - - , m > 1 , (9) 



which is the discrete counterpart of the exponential distribu- 
tion. The distribution has mean (m) = /i. 

As we show in Fig.[2|d), some small differences seem to ap- 
pear between the WP approach and the B approach, although 
it is difficult to establish conclusively if these differences are 
significant or not. 

In the light of this, we investigate in more depth the rela- 
tionship between the bipartite modularity in Eq. (|5]l and the 
weighted extension of the unipartite modularity in Eq. (Q~|). As 
we show in the Appendix, the bipartite modularity actually 
reduces to the weighted unipartite modularity (up to an irrele- 
vant additive constant) when all teams in the bipartite network 
have the same size. 

This observation explains why the WP and the B approach 
differ when teams have unequal sizes |[50ll . Although our re- 
sults suggest that each approach outperforms the other in cer- 
tain cases, we believe that Eq. © is, in general, preferable be- 
cause it explicitly takes into account the distribution of team 
sizes, while the weighted projection does not. 



B. Southern women dataset 

During the 1930s, ethnographers Allison Davis, Elizabeth 
Stubbs Davis, J. G. St. Clair Drake, Burleight B. Gardner, 
and Mary R. Gardner collected data on social stratification in 
the town of Natchez, Mississippi 11321 14711 . Part of their field 
work consisted in collecting data on women's attendance to 
social events in the town. The researchers later analyzed the 
resulting women-event bipartite network in the light of other 
social and ethnographic variables. Since then, the dataset has 
become a de facto standard for discussing bipartite networks 
in the social sciences IB2I1 . 

Here we analyze the modules of both women and events. 
We start by considering the unweighted projection of the net- 
work in the women's space (two women are connected if 
they co-attended at least one event), and in the events' space 
(two events are connected if at least one woman was in both 
events). As we show in Fig. Oa), the unweighted projection 
does not capture the true modular structure of the network. 
The failure of this approach is due to the fact that the projec- 
tions are very dense. For example, some central events were 
attended by most women and thus most pairs of women are 
connected in the projection. 

As we show in Fig. 0b), the weighted projection approach 
and the bipartite approach yield the exact same results, which 
do capture the two-module structure of the network. Except 
for one woman, the partition coincides with the original sub- 
jective partition proposed by the ethnographers who collected 
the data, and is in perfect agreement with some of the super- 
vised algorithms reviewed in Ref . 113211 . 



V. MODULES IN DIRECTED NETWORKS 

Another important class of networks for which no satisfac- 
tory module identification algorithm has so far been proposed 
is directed unipartite networks. In order to tackle this class of 
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FIG. 3: Modular structure of the Southern women dataset 1 32, 47], 
Circles represent women and diamonds represent social events. A 
woman and an event are connected if the woman attended the 
event, (a) Modular structure as obtained from the unweighted pro- 
jection (UWP) approach, (b) Modular structure as obtained from the 
weighted projection (WP) approach and the bipartite (B) approach. 
The UWP approach fails to capture the real modular structure of the 
network. 
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FIG. 4: Application of the bipartite approach to the identification 
of modules in directed networks, (a, b) A directed model network. 
A link from node i to node j is established according to the prob- 
abilities in the matrix in (a). For example, there is a probability pi 
that there is a link from node 1 to node 13. In particular, we use 
Pi — 0.45 > p = 0.05 to generate the directed network in (b). (c) 
Bipartite representation of the network in (b). Each node i is in (b) is 
represented by two nodes here, a circle Ai and a square Bi. All links 
in the bipartite network run between circles and diamonds, and a link 
between A% and Bj corresponds to a link from i to j in the directed 
network, (d) Modules identified in the bipartite network, (e) Mod- 
ules identified from the directed network disregarding link direction. 
Here, we use the same color for Ai and Bi, since this approach does 
not make distinctions between incoming and outgoing links. 



networks, we note that directed networks can be conveniently 
represented as bipartite networks where each node i is repre- 
sented by two nodes Ai and Bi. A directed link from i to j 
would be represented in the bipartite network as an edge con- 
necting Ai to Bj. 

Consider, for example, a network in which nodes are com- 
panies and links represent investments of one company into 
another. By considering each company as two different ob- 
jects, one that makes investments and one that receives invest- 
ments, the directed network can be represented as an undi- 
rected bipartite network. Modules in the set of objects that 
make investments correspond to groups of companies that in- 
vest in the same set of companies, that is, groups of companies 
with a similar investing strategy. 



The most widely used approach to identify communities in 
directed networks is to simply disregard the directionality of 
the links and identify modules using a method suitable for 
undirected unipartite networks. This method might work in 
some situations, but will fail when different modules are de- 
fined based on incoming and outgoing links. 

Consider, for instance, the simple model network depicted 
in Figs. Ufa, b). According to the outgoing links of the nodes 
this network has two modules: nodes 1-12 and nodes 13-24. 
According to the incoming links of the nodes the network has 
also two modules, but they are different: nodes 1-6 and 13- 
18 on the one hand, and nodes 7-12 and 19-24 on the other. 
As we show in Fig. |4|c), a layout of the corresponding bipar- 
tite network already makes clear the modular structure of the 
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network, and any of the approaches described above (UWP, 
WP, and B) is able to identify the in-modules and out-modules 
correctly; Fig. Efd). Disregarding the direction of the links, 
however, results in modules that fail to capture the modular 
structure of the network; Fig.|4fe). 



when team sizes are not uniform. Given this, we believe that 
the bipartite approach has a more straightforward interpreta- 
tion and would be preferable in cases in which the modular 
structure of the network is unknown. 



VI. DISCUSSION 

In this work, we have focused on approaches that aim at 
identifying modules in each of the two sets of nodes in the 
bipartite network independently. There are two main reasons 
for this choice. First, methodologically our choice enables 
comparison with projection-based algorithms, which, by def- 
inition, cannot identify modules of actors and teams simul- 
taneously. Second, in most situations it is reasonable to as- 
sume that two actors belong to the same module if they co- 
participate in many teams, regardless of whether the teams 
themselves belong to the same module or not. An alternative 
approach, however, would be to group nodes in both sets at 
the same time. 

Another interesting observation relates to the optimization 
algorithm used to maximize the modularity. Although we 
have chosen to use simulated annealing to obtain the best pos- 
sible accuracy 1115113a. 13711 . one can trivially use the new mod- 
ularity introduced in Eq. (0 with faster algorithms such as 
greedy search [42] or extremal optimization [43]. 

Interestingly, one can also use the spectral methods intro- 
duced in 11441 145I1 . Indeed, just as the unipartite modularity 
Ai(P), the bipartite modularity .Mg('P) can be rewritten in 
matrix form as 



M B (V) =g T Bg, 



(10) 



where gi S — 1 if node i belongs to module s and otherwise, 
and the elements of the modularity matrix B are defined as 




i = J 



(11) 



Even more importantly, by sampling all local maxima of 
the modularity in Eq. (0 one can study, not only the most 
modular partition of the network, but the hierarchical struc- 
ture of nested modules and submodules 1 34] within each set 
of nodes in the bipartite network. This is particularly relevant 
taking into account that the most modular partition of a net- 
work may, in some cases, not represent the most "relevant" 
division of its nodes l33i l34ll . 

Finally, a few words are necessary on the comparison be- 
tween the different approaches. First, we have shown that the 
(so far "default") unweighted projection approach is not re- 
liable and can lead, in most situations, to incorrect results. 
Therefore, we believe that this approach should not be used. 
As for the weighted projection approach and the bipartite ap- 
proach, we have shown that their performance is very similar, 
and that they are actually equivalent when all teams in the bi- 
partite network have the same size. We have also pointed out, 
however, that they can and do give noticeably different results 
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APPENDIX A: WEIGHTED UNIPARTITE MODULARITY 
AND BIPARTITE MODULARITY FOR BIPARTITE 
NETWORKS WITH UNIFORM TEAMS 

Next, we demonstrate that, when all teams in a bipartite net- 
work have the same size m, the bipartite modularity is equiv- 
alent to the modularity of the weighted projection. 

We consider the usual weighted projection, in which each 
pair of nodes i ^ j is connected by a link whose weight Wij 
equals the number of times that i and j are together in a team; 
using our previous notation Wij = ctj . No self-links are in- 
cluded in the projection. 

In this projection, and when all teams have the same num- 
ber of actors m a = m, the constant team-size factors in 
Eq. (|5]l become 

y"m a (m a -l) = N T m(m - 1) = 2W (Al) 



2W 
m — 1 



(A2) 



i>j ^j- 



where, as before, W = 

Each time an actor is in a team, the total weight of the links 
in the projected network increases by (m — 1). Using this and 
the identities above, we obtain 



E 



X) a TO a (m a -l) W 

E 



(A3) 



(Ea TO ») 



2W 



its j&s,j^i 
(<\ 2 _ 

\2W J 



2W 



E 



2W 



(A4) 
(A5) 



Once the summation over modules is carried out, the last 
term is simply a constant independent of the partition, and is 
therefore irrelevant. Thus, up to an irrelevant constant, when 
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all teams in a bipartite network have the same size, the bipar- larity in Eq. ©. 
tite modularity in Eq. © is equivalent to the weighted modu- 
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